Ergodic methods in additive combinatorics 



Bryna Kra 

Abstract. Shortly after Szemeredi's proof that a set of positive upper den- 
sity contains arbitrarily long arithmetic progressions, Furstenberg gave a new 
proof of this theorem using ergodic theory. This gave rise to the field of er- 
godic Ramsey Theory, in which problems motivated by additive combinatorics 
are proven using ergodic theory. Ergodic Ramsey Theory has since produced 
combinatorial results, some of which have yet to be obtained by other means, 
and has also given a deeper understanding of the structure of measure preserv- 
ing systems. We outline the ergodic theory background needed to understand 
these results, with an emphasis on recent developments in ergodic theory and 
the relation to recent developments in additive combinatorics. 

These notes are based on four lectures given during the School on Additive 
Combinatorics at the Centre de Recherches Mathematiques, Montreal in April, 
2006. The talks were aimed at an audience without background in ergodic 
theory. No attempt is made to include complete proofs of all statements and 
often the reader is referred to the original sources. Many of the proofs included 
are classic, included as an indication of which ingredients play a role in the 
developments of the past ten years. 



1. Combinatorics to ergodic theory 

1.1. Szemeredi's Theorem. Answering a long standing conjecture of Erdos 
and Turan | 11| . Szemeredi |54j showed that a set E C Z with positive upper den- 
sity 1 contains arbitrarily long arithmetic progressions. Soon thereafter, Fursten- 
berg |16j gave a new proof of Szemeredi's Theorem using ergodic theory, and this 
has lead to the rich field of ergodic Ramsey theory. Before describing some of the 
results in this subject, we motivate the use of ergodic theory for studying combi- 
natorial problems. 

We start with the finite formulation of Szemeredi's Theorem: 

Theorem 1.1 (Szemeredi [54]). Given 5 > and k e N, there is a function 
N(S, k) such that if N > N(5, k) and E C {1, . . . , N} is a subset with \E\ > SN, 
then E contains an arithmetic progression of length k. 

It is clear that this statement immediately implies the first formulation of Sze- 
meredi's Theorem, and a compactness argument gives ths converse implication. 
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1.2. Translation to a probability system. Starting with Szemeredi's The- 
orem, one gains insight into the intersection of sets in a probability system 2 : 

Corollary 1.2. Let 5 > 0, k e N, (X, X,fi) be a probability space and 
Ai,. . . ,A N e X with fi(Ai) > S for i = l,...,N. If N > N(5,k), then there 
exist a, d € N such that 

A a n A a+d n A a+2d n . . . n A a+kd ^ . 

Proof. For A S X, let 1a(x) denote the characteristic function of A (meaning 
that 1a (x) is 1 for x £ A and is otherwise). Then 

iv-i 

Thus there exists a; € X with J^Jq 1 •"■•A„ (#) > Then _E = {n : a; £ 
A„} satisfies > <5Af, and so Szemeredi's Theorem implies that E contains an 
arithmetic progression of length k. □ 

1.3. Measure preserving systems. A probability measure preserving system 
is a quadruple (X, X,^,T), where (X, X,\x) is a probability space and T: X — > X 
is a bijective, measurable, measure preserving transformation. This means that for 
all A £ X, T- X A € X and 

M(T-M) = »(A) . 

In general, we refer to a probability measure preserving system as a system. 

Without loss of generality, we can place several simplifying assumptions on 
our systems. We assume that X is countably generated; thus for 1 < p < oo, 
L p (/i) is separable. We implicitly assume that all sets and functions are measurable 
with respect to the appropriate cr-algebra, even when this is not explicitly stated. 
Equality between sets or functions is meant up to sets of measure 0. 

1.4. Furstenberg multiple recurrence. In a system, one can use Szemeredi's 
Theorem to derive a bit more information about intersections of sets. If {X, X, [i, T) 
is a system and A G X with fi(A) > 5 > 0, then 

A, T~ X A, T~ 2 A, . . . , T~ n A, . . . 

are all sets of measure > 5. Applying Corollary 11.21 to this sequence of sets, we 
have the existence of a, d € N with 

T~ a A n T~( a+rf 'A n 7 i ^( a + 2d ) n . . . n T~ ( - a+kdS> A ^ 

Furthermore, the measure of this intersection must be positive. If not, we could 
remove from A a subset of measure zero containing all the intersections and obtain 
a subset of measure at least 5 without this property. In this way, starting with 
Szemeredi's Theorem, we have derived Furstenberg's multiple recurrence theorem: 

Theorem 1.3 (Furstenberg [TS]). Let (X,X,fi,T) be a system, and let A e X 
with (i(A) > 0. Then for any k > 1, there exists n € N such that 

(l.i) fj,(A n T~ n A n T~ 2n A n ■ • • n T~ kn A) > o . 



2 By a probability system, we mean a triple (X, X, fi) where X is a measure space, X is a 
cr-algebra of measurable subsets of X , and fi is a probability measure. In general, we use the 
convention of denoting the cr-algebra X by the associated calligraphic version of the measure 
space X. 
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2. Ergodic theory to combinatorics 

2.1. Strong form of multiple recurrence. We have seen that Furstenberg 
multiple recurrence can be easily derived from Szemeredi's Theorem. More inter- 
esting is the converse implication, showing that one can use ergodic theory to prove 
regularity properties of subsets of the integers. This approach has two major com- 
ponents, and has been since used to deduce other patterns in subsets of integers 
with positive upper density. (See Sectional) The first is proving a certain recur- 
rence statement in ergodic theory, like that of Theorem 1 1.31 The second is showing 
that this statement implies a corresponding statement about subsets of the integers. 
We now make this more precise. 

To use ergodic theory to show that some intersection of sets has positive mea- 
sure, it is natural to average the expression under consideration. This leads us to 
the strong form of Furstenberg's multiple recurrence: 

Theorem 2.1 (Furstenberg [EH) Let (X,X,fx,T) be a system and let A G X 
with n(A) > 0. Then for any k > 1, 

N-l 

(2.1) liminf — fj,(A n T~ n A n T~ 2n A D...D T- kn A) 



N^oo N ^ 
n=0 



is positive. 



In particular, this implies the existence of infinitely many n G N such that 
the intersection in i|l.lfl is positive and Theorem 1 1 . 31 follows . We return later to a 
discussion of how to prove Theorem 12. II 

2.2. The correspondence principle. The second major component is us- 
ing this multiple recurrence statement to derive a statement about integers, such as 
Szemeredi's Theorem. This is the content of Furstenberg's Correspondence Princi- 
ple: 

Theorem 2.2 (Furstenberg |16) . |17|). Let E C Z have positive upper density. 
There exist a system {X, X, /i,T) and a set A G X with fi(A) = d*{E) such that 

li{T- mi AC\ • • • n T- rnk A) <d*((E + mx)n---n(E + m k j) 

for all k G N and all mi, . . . , G Z. 

Proof. Let X = {0, 1} Z be endowed with the product topology and the shift 
map T given by Tx(n) = x(n+ 1) for all n G Z. A point of X is thus a sequence x — 
{x(n)},i 6 z, and the distance between two points x = {x(n)} ne z, y = {y(n)} n ez € X 
is defined to be if x = y and 2 _fe if x ^ y and k = min{|n| : x(n) ^ y(n)}. Define 
aG{0,l} z by 

' 1 if n G E 
otherwise 

and let A = {x G X : x(Q) = 1}. Thus A is a clopen (closed and open) set. 
For all neZ. 

T~ n a G A if and only if n G E . 

By definition of d*(E), there exist sequences {Mi} and {Ni} of integers with 
Ni — > oo such that 

lim ±-\E n [M h Mi + N - 1]| -> d*(E) . 

i— »oo iv,' 



a(n) 
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Then 

Mi+Ni-X Mt+Ni-1 

^ E i«=m- E uin) = <rw. 

% n=Mi 1 n=Mi 

Let C be the countable algebra generated by cylinder sets, meaning sets that 
arc defined by specifying finitely many coordinates of each element and leaving the 
others free. We can define an additive measure /i on C by 

Mi+Nt-l 

^) = ^ E 

n=Mi 

where we pass to subsequences {iVj}, {Mi} such that this limit exists for all B £ C. 
(Note that C is countable and so by diagonalization we can arrange it such that 
this limit exists for all elements of C) 

We can extend the additive measure to a cr-additive measure /i on all Borel 
sets X in X, which is exactly the cr-algebra generated by C. Then /x is an invariant 
measure, meaning that for all B £ C, 

nir^B) = hm — J2 Mr" -1 *) = KB) ■ 

1 n=Mi 

Furthermore, 

Mi+JVi-l 

//(A) = hm F E ^ ( T " fl ) = d * W ■ 

* n=Mi 

If mi, . . . , TOfe £ Z, then the set T~ rni A n ... PI T~ mk A is a clopen set, its indicator 
function is continuous, and 

Mi+Ni-l 

fi(T- m *An...nT- m *A) = hm — ±T-^An...nT-^A{T n a) 



l — >oo iV," 

1 n=Mi 



Mi+Ni-l 

l 

lim 

woo N, 

1 n=M, 



^ Mi+Ni-l 

rr E 1 (E+mi)n...n(E4- mk ) H < d* ((E + mi) n . . . n {E + m k )) 



□ 

We use this to deduce Szemeredi's Theorem from Theorem ll.3l As in the proof 
of the Correspondence Principle, define a £ {0, 1} Z by 



a(n) 



I if n £ E 
otherwise 



and set A = {x £ {0, 1} Z : x(0) = 1}. Thus T n a £ A if and only if ri £ E. 
By Theorem II .31 there exists n £ N such that 

/l«(A n T" n A n t~ 2,1 a n . . . n r- fcn A) > o . 

Therefore for some m G N, T m a enters this multiple intersection and so 

a(m) = a(m + n) = a(m + In) = . . . = a(m + kn) = 1 . 
But this means that 

m,m + n,m + 2n, . . . ,m + kn £ E 
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and so we have an arithmetic progression of length k + 1 in E. 

3. Convergence of multiple ergodic averages 

3.1. Convergence along arithmetic progressions. Furstenberg's multiple 
recurrence theorem left open the question of the existence of the limit in 12.1(1 . More 
generally, one can ask if given a system (X, X, [i, T) and /i, /a, . . . , fk G L°°(fi), 
does 

N-l 

(3.1) Jim - ]T fi(T n x) ■ f 2 (T 2n x) ..... f k {T kn x) 

n=0 

exist? Moreover, we can ask in what sense (in L 2 (/j) or pointwise) does this limit 
exist, and if it does exist, what can be said about the limit? Setting each function 
fi to be the indicator function of a measurable set A, we are back in the context of 
Furstenberg's Theorem. 

For k = 1, existence of the limit in L 2 ([i) is the mean ergodic theorem of von 
Neumann. In Section FOl we give a proof of this statement. For k = 2, existence of 
the limit in L 2 (/x) was proven by Furstenbcrg 16 as part of his proof of Szemeredi's 
Theorem. Furthermore, in the same paper he showed the existence of the limit in 
L 2 {n) in a weak mixing system for arbitrary fc; we define weak mixing in Section f5.5l 
and outline the proof for this case. 

For k > 3, the proof requires a more subtle understanding of measure preserving 
systems, and we begin discussing this case in Section Under some technical 
hypotheses, the existence of the limit in L 2 (/i) for k = 3 was first proven by Conze 
and Lesigne (see jHj and [2]), then by Furstenberg and Weiss |22| . and in the general 
case by Host and Kra |32j . More generally, we showed the existence of the limit 
for all k G N: 

Theorem 3.1 (Host and Kra p2]). Let (X,X,n,T) be a system, let k G N, 
and let /i, fi, . . . , fk G L°°(fj,). Then the averages 

N-l 

- £ fi(T n x) ■ f 2 (T 2n x) f k (T kn x) 

converge in i 2 (/i) as N —>■ oo. 

Such a convergence result for a finite system is trivial. For example, if X = 
Z/NZ, then X consists of all partitions of X and \i is the uniform probability 
measure, meaning that the measure of a set is proportional to the cardinality of a 
set. The transformation T is given by Tx = x + 1 mod N. It is then trivial to check 
the convergence of the average in (|3.1|l . However, although the ergodic theory is 
trivial in this case, there are common themes to be explored, and throughout these 
notes, an effort is made to highlight the connection with recent advances in additive 
combinatorics (see |39| for more on this connection). Of particular interest is the 
role played by nilpotent groups, and homogeneous spaces of nilpotent groups, in 
the proof of the ergodic statement. Some of these connections are further discussed 
in the notes of Ben Green and Terry Tao. 

Much of the present notes is devoted to understanding the ingredients in the 
proof of Theorem 13.11 and the role of nilpotent groups in this proof. Other ex- 
pository accounts of this proof can be found in (31) and in |40| . 2-step nilpotent 
groups first appeared in the work of Conze-Lesigne in their proof of convergence for 
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k = 3, and a (fc — l)-step nilpotent group plays a similar role in convergence for the 
average in (|3.1|1 . Nilpotent groups also play some role in the combinatorial setup, 
and this has been recently verified by Green and Tao (see |26| . |27j . and |28j ) for 
progressions of length 4 (which corresponds to the case k = 3 in i|3.1|l ) . For more 
on this connection, see the lecture notes of Ben Green in this volume. 

3.2. Other results. Using ergodic theory, other patterns have been shown to 
exist in sets of positive upper density and we discuss these results in Sectional We 
briefly summarize these results. A striking example is the theorem of Bcrgelson and 
Leibman [6] showing the existence of polynomial patterns in such sets. Analogous 
to the linear average corresponding to arithmetic progressions, existence of the 
associated polynomial averages has been shown in |35) and |45| . One can also 
average along 'cubes'; existence of these averages and a corresponding combinatorial 
statement was shown in 34 . For commuting transformations, little is known and 
these partial results are summarized in Section r9.ll An explicit formula for the limit 
in (|3.1() was given by Ziegler |56| . who also has recently given a new proof |57| of 
Theorem 13. II 

4. Single convergence (the case k = 1) 

4.1. Poincare Recurrence. The case k = 1 in Furstenberg's multiple recur- 
rence ('Theorem II .3|) is Poincare Recurrence: 

Theorem 4.1 (Poincare 02]). // (X, X, ^,T) is a system and A £ X with 
n(A) > 0, then there exist infinitely many neN such that fi(A n T~ n A) > 0. 

Proof. Let F = {x e A: T~ n x $ A for all n > 1}. Assume that F n T~ n F = 
for all n > 1. This implies that for all integers n ^ rn, 

T~ m A n T~ n A = . 

In particular, F,T~ 1 F,T~ 2 F, . . . are all pairwise disjoint sets and each set in this 
sequence has measure equal to fJ.{F). If fi(F) > 0, then 

/ x(|jT-»^) = ^ / x(F)=oo, 

n>0 n>0 

a contradiction of fj, being a probability measure. 

Therefore n(F) = and the statement is proven. □ 

In fact the same proof shows a bit more: by a simple modification of the 
definition of F, we have that /i-almost every x € A returns to A infinitely often. 

4.2. The von Neumann Ergodic Theorem. Although the proof of Poincare 
Recurrence is simple, unfortunately there seems to be no way to generalize it for 
multiple recurrence. Instead we prove a stronger statement, taking the average of 
the expression under consideration and showing that the lim inf of this average is 
positive. It is not any harder (for k = 1 only!) to show that the limit of this aver- 
age exists (and is positive). This is the content of the von Neumann mean ergodic 
theorem. We first give the statement in a general Hilbert space: 
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Theorem 4.2 (von Neumann [55 ). If U is an isometry of a Hilbert space 
TL and P is the orthogonal projection onto the U -invariant subspace X = {/ G 
H:Uf = f}, then for all f eTL, 



N-l 



^Y.v n f = pf. 

Thus the case k = 1 in Theorem 13. II is an immediate corollary of Theorem 14. 21 
Proof. If / e X, then 

N-l 



n=0 



for all N G N and so obviously the average converges to /. On the other hand, if 
/ = g — Ug for some g G TL, then 



N-l 



U n f = g-U N g 



and so the average converges to as N — > oo. Setting J = {g 
taking f k ej and / fe -> / S J, then 



Ug: g € H} and 



JV-l 



AT-1 



Af-1 



a? E ^/|| MlrE ^ n (/ - A)|| + IatE ^(A) 



n=0 



n=0 



N-l 



< 



N-l 



N 



n=0 



/ _ fk 



l E ^CA) 



n=0 



Thus for / G c7j the average ^ H^To* converges to as — > oo. 

We now show that an arbitrary / € TL can be written as a combination of 
functions which exhibit these behaviors, meaning that any / G TL can be written 
as / = f\ + f 2 for some /i el and / 2 e J. If h G J- L 1 then for all g eTL, 

= (h, g - Ug) = (h, g) - (h, Ug) = (h, g) - (U*h, g) = (h- U*h, g) 

and so h = U*h and h = Uh. Conversely, reversing the steps we have that if h G X, 
then he J x . 

Since J = J 1 - , we have 

TL=X®J . 

Thus writing / = f\ + fi with fi el and fi G J ', we have 

N-l N-l N-l 



n—0 n=0 n—0 

The first sum converges to the identity and the second sum to 0. 



□ 



Under a mild hypothesis on the system, we have an explicit formula for the 
limit. Let (X,X,/j,,T) be a system. A subset A C X is invariant if T~ 1 A = A. 
The invariant sets form a sub-cr- algebra X of X . The system (X, X, fi, T) is said to 
be ergodic if X is trivial, meaning that every invariant set has measure or measure 
1. 
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A measure preserving transformation T: X — > X defines a linear operator 
U T : i 2 ! (M) -^L\y) by 

(U T f)(x) = f(Tx) . 

It is easy to check that the operator Ut is a unitary operator (meaning its adjoint 
is equal to its inverse). In a standard abuse of notation, we use the same letter to 
denote the operator and the transformation, writing Tf(x) = f(Tx) instead of the 
more cumbersome Ut/(x) = f(Tx). 
We have: 

Corollary 4.3. If (X,X, fi,T) is a system and f <E L 2 (fi), then 

1 

jr E 

71=0 

converges in L 2 (/i), as N — > oo, to a T -invariant function f . If the system is 
ergodic, then the limit is the constant function J f d[i. 

Let (X, X , fi,T) be an ergodic system and let A,Bg X. Taking / = 1a in 
Corollary 14.31 and integrating with respect to [i over a set B, we have: 

N — 1 

A^^E / 1 A(T n x)d^(x)= ( (f l A (y)d»(y))d»(x) . 
This means that 

w-i 

hm -^ M (4n T~ n B) = M (A) M (B) . 

iv— >oo iV ^ — * 

In fact, one can check that this condition is equivalent to ergodicity. 

As already discussed, convergence in the case of the finite system 1/N1 with the 
transformation of adding 1 mod N, is trivial. Furthermore this system is ergodic. 
More generally, any permutation on Z/iVZ can be expressed as a product of disjoint 
cyclic permutations. These permutations are the 'indecomposable' invariant subsets 
of an arbitrary transformation on 1/N1 and the restriction of the transformation 
to one of these subsets is ergodic. 

This idea of dividing a space into indecomposable components generalizes: an 
arbitrary measure preserving system can be decomposed into, perhaps continuously 
many, indecomposable components, and these are exactly the ergodic ones. Using 
this ergodic decomposition (see, for example, |10|). instead of working with an 
arbitrary system, we reduce most of the recurrence and convergence questions we 
consider here to the same problem in an ergodic system. 

5. Double convergence (the case k = 2) 

5.1. A model for double convergence. We now turn to the case of k = 2 
in Theorem 13. II and study convergence of the double average 

JV-l 

-Y^h(T n x)-f 2 (T^x) 

n=0 

for bounded functions f\ and f2- Our goal is to explain how a simple class of 
systems, the rotations, suffice to understand convergence for the double average. 
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First we explicitly define what is meant by a rotation. Let G be a compact 
abelian group, with Borel cr-algebra B, Haar measure m, and fix some a e G. 



The system (G, m, T) is called a group rotation. It is ergodic if and only if Za 
is dense in G. For example, when X is the circle T = R/Z and a ^ Q, the rotation 
by a is ergodic. 

The double average is the simplest example of a nonconventional ergodic aver- 
age: even for an ergodic system, the limit is not necessarily constant. This sort of 
behavior does not occur for the single average of von Neumann's Theorem, where 
we have seen that the limit is constant in an ergodic system. Even for the simple 
example of an an ergodic rotation, the limit of the double average is not constant: 

Example 5.1. Let X = T, with Borel cr-algebra and Haar measure, and let 
T: X — > X be the rotation Tx = x + a mod 1. Setting fi(x) — exp(47rza;) and 
f2{x) = cxp(—2-Kix), then for all n e N, 



In particular, this double average converges to a nonconstant function. 

More generally, if a ^ Q and /i, f 2 € the double average converges to 



We shall see that Fourier analysis suffices to understand this average. By 
taking both functions to be the indicator function of a set with positive measure 
and integrating over this set, we then have that Fourier analysis suffices for the 
study of arithmetic progressions of length 3, giving a proof of Roth's Theorem via 
ergodic theory. Later we shall see that other more powerful methods arc needed 
to understand the average along longer progressions. In a similar vein, rotations 
are the model for an ergodic average with 3 terms, but are not sufficient for more 
terms. We introduce some terminology to make these notions more precise. 

5.2. Factors. For the remainder of this section, we assume that (X,X, fj,,T) 
is an ergodic system. 

A factor of a system (X,X,fj,,T) can be defined in one of several equivalent 
ways. It is a T-invariant sub-cr-algebra y of X. A second characterization is that 
a factor is a system (Y, y, v, S) and a measurable map 7r : X — > Y, the factor 
map, such that fi o n^ 1 = v and S o n = n o T for ^-almost every x e X. A 
third characterization is that a factor is a T-invariant subalgebra T of L°° (/x) . One 
can check that the first two definitions agree by identifying y with 7r _1 (J 7 ), and 
that the first and third agree by identifying T with L°°(y). When any of these 
conditions holds, we say that Y, or the appropriate sub-er-algebra, is a factor of X 
and write tt : X — > Y for the factor map. We make use of a slight (and standard) 
abuse of notation, useing the same letter T to denote both the transformation in 
the original system and the transformation in the factor system. If the factor map 
tt: X — > Y is also injective, we say that the two systems (X, X, /j, T) and (Y, y, v, S) 
are isomorphic. 

For example, if (X, X, /i, T) and (Y, y, v, S) are systems, then each is a factor 
of the product system (X x Y, X x y, \i x v, T x S) and the associated factor map 
is projection onto the appropriate coordinate. 



Define T: G^ G by 



Tx = x + a . 



f\{T n x) ■ f 2 (T x) = f 2 ( x ) . 
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A more interesting example can be given in the system X = T x T, with Borcl 
cr-algebra and Haar measure, and transformation T: X — > X given by 

T(x, y) = (x + a,y + x) . 

Then T with the rotation ii->j; + aisa factor of X. 

5.3. Conditional expectation. If y is a T-invariant sub-a-algebra of X and 
/ G L 2 (fj,), the conditional expectation E(/ | y) of f with respect to y is the function 
on Y defined by E(/ \ Y) o n = E(/ | y). It is characterized as the ^-measurable 
function on X such that 

/ f(x) ■ g(n(x)) dfi(x) = [ E(f | y)(y) ■ g(y) dv{y) 

JX JY 

for all g S L°°(v) and satisfies the identities 

J E(f\y)dfx = J fdfi 

and 

TE(f | y) = E(Tf | y) . 

As an example, take X = T x T endowed with the transformation (x, y) ^— > 
(.x + a, y + x). We have a factor Z = T endowed with the map x > a; + a. 
Considering f(x,y) = exp(x) + exp(y), we have E(/ | Z) = exp(x). The factor 
sub-cr-algebra Z is the cr-algebra of sets that depend only on the x coordinate. 

5.4. Characteristic factors. For fi,...,fk £ L co (fi), we are interested in 
convergence in L 2 (/j) of: 

1 N ^ 

(5.1) - £ T n fi ■ T 2n f 2 ..... T fc "/fc ■ 

n=0 

Instead of working with the whole system (X, X , fi,T), it turns out that it is easier 
to find some factor of the system that characterizes this average, meaning that if we 
have some means of understanding convergence of the average under consideration 
in a well chosen factor, then we can also understand convergence of the same average 
in the original system. This motivates the following definition. 

A factor Y of X is characteristic for the average (f 3 . 1 1) if this average converges 
to when E(/j | y) = for some i E {1, . . . , k}. This is equivalent to showing that 
the difference between H5.1|l and 

N-l 

— T " E (/i I y) ■ T 2n E(f 2 \y)-...- T kn E(f k i yy 

converges to in L 2 (n). 

By definition, the whole system is always a characteristic factor. Of course 
nothing is gained by using such a characteristic factor, and the notion only be- 
comes useful when we can find a characteristic factor that has useful geometric 
and/or algebraic properties. A very short outline of the proof of convergence of the 
average (|5.1H is as follows: find a characteristic factor that has sufficient structure 
so as to allow one to prove convergence. We return to this idea later. 

The definition of a characteristic factor can be extended for any other average 
under consideration, with the obvious changes: the limit remains unchanged when 
each function is replaced by its conditional expectation on this factor. This notion 
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has been implicit in the literature since Furstenberg's proof of Szemeredi's Theorem, 
but the terminology was only introduced more recently in |22j . 



5.5. Weak mixing systems. The system (X, X , /j,,T) is weak mixing if for 
all A, B G X, 

N-l 

jtL n ^ W~ nA n B) ~ ^ A) ^ B) I = • 

There are many equivalent formulations of this property, and we give a few 
(see, for example |10j): 

Proposition 5.2. Let (X, X,/i,T) be a system. The following are equivalent: 

(1) (X, X,fi, T) is weak mixing. 

(2) There exists J C N of density zero such that for all A, B G X 

p,(T~ n A n B) -> n{A)n{B) asn^oo and n£ J . 

(3) For all A,B,C £ X with fi(A)fi(B)^,(C) > 0, there exists n £ N such that 

fi{A n T~ n B)fi(A n T- n C) > . 

(4) The system (X x X, X x A" , /i x /i, T x T) is ergodic. 

Any system exhibiting rotational behavior (for example a rotation on a circle, 
or a system with a nontrivial circle rotation as a factor) is not weak mixing. We 
have already seen in Example 15 . II that weak mixing, or lack thereof, has an effect 
on multiple averages. We give a second example to highlight this effect: 

Example 5.3. Suppose that X = X 1 UX 2 UX 3 with T(X X ) = X 2 , T(X 2 ) = X 3 
and T{X 3 ) = Xi, and that T 3 restricted to Xi, for i = 1,2, 3, is weak mixing. For 
the double average 

1 N ^ 

-J2fi(T n x)-f 2 (T^x) , 

n=0 

where fi, f 2 £ L°°(/x), if x G X\, this average converges to 

of/ A d M / fid(jL+ I fi dfi I f 2 d^+ [ fi dfj. I f 2 dfi) . 

A similar expression with obvious changes holds for x £ X 2 or x £ X 3 . 

The main point is that (for the double average) the answer depends on the 
rotational behavior of the system. This example lacks weak mixing and so has 
nontrivial rotation factor. We now formalize this notion. 



5.6. Kronecker factor. The Kronecker factor (Z\, Z±,m,T) of (X, X, /i, T) 
is the sub- a- algebra of X spanned by the eigenfunctions. A classical result is that 
the Kronecker factor can be given the structure of a group rotation: 

Theorem 5.4 (Halmos and von Neumann 30 ). The Kronecker factor of a 
system is isomorphic to a system (Z\,Z\,m,T), where Z\ is a compact abelian 
group, Zi is its Borel a-algebra, m is the Haar measure, and Tx = x + a for some 
fixed a £ Z\. 
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We use 7Ti : X — > Zi to denote the factor map from a system (X, X, /i, T) to 
the Kronecker factor (Zi, Zi, m, T). Then any eigenfunction of X takes the form 

f(x) = 07(71-1 (x)) , 

where c is a constant and 7 G Z\ is a character of Z\. 
We give two examples of Kronecker factors: 

Example 5.5. If X = T x T, a e T, and T: X -> X is the map 

= + + z) i 

then the rotation x 1— ► x + a on T is the Kronecker factor of X. It corresponds 
to the pure point spectrum. (The spectrum in the orthogonal complement of the 
Kronecker factor is countable Lebesgue.) 

Example 5.6. If X = T 3 , a e T, and T: X -> X is the map 

r(a;, y, z) = (x + a, y + x, 2 + y) , 

then again the rotation x 1— * x + a on T is the Kronecker factor of X. This example 
has the same pure point spectrum as the first example, but the first example is a 
factor of the second example. 

The Kronecker factor can be used to give another characterization of weak 
mixing: 

Theorem 5.7 (Koopman and von Neumann |38j ). A system is not weak mixing 
if and only if it has a nontrivial factor which is a rotation on a compact abelian 
group. 

The largest of these factors is the Kronecker factor. 

5.7. Convergence for k = 2. If we take into account the rotational behavior 
in a system, meaning the Kronecker factor, then we can understand the limit of the 
double average 

1 N ^ 

(5.2) - J2 Tn h ■ T2n h ■ 

n=0 

An obvious constraint is that for //-almost every x, the triple (x, T n x, T 2n x) 
projects to an arithmetic progression in the Kronecker factor Z\. Furstenberg 
proved that this obvious restriction is the only restriction, showing that to prove 
convergence of double average, one can assume that the system is an ergodic rota- 
tion on a compact abelian group: 

Theorem 5.8 (Furstenberg [HI). If(X,X,fx,T) is an ergodic system, {Zx,Z\, 
m,T) is its Kronecker factor, and /i,/2, G L°°(fj,), then the limit 



N-l N-l 

T n fi ■ r 2n h T " E (/i 1 Zi ) • T2n nh 1 z x ) 

n=0 n=0 



tends to as N — > 00. 



In our terminology, this theorem can be quickly summarized: the Kronecker 
factor is characteristic for the double average. To prove the theorem, we use a 
standard trick for averaging, which is an iterated use of a variation of the van der 
Corput Lemma on differences (see |41j or [2]): 
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Lemma 5.9 (van der Corput). Let {u n } be a sequence in a Hilbert space with 
\\u n \\ < 1 for all neN. For h € N, set 

7/j = limsup — (u n +h, «r. 



Then 



N- 



N-l 



n=0 



H-l 



lim sup 

AT— >oo 



n=0 " h=0 

PROOF. Given e > and M £ N, for iV sufficiently large we have that 



^ iV-l ^ ^ N-1H-1 

n=0 n=0 ?i=0 



< e 



By convexity, 
jv-i _^ H-l 

^ E ^ E 

n=0 /i=0 

and this approaches 



< 



w-i ^ J?-l 

iV 



h=0 



1 1 



N-l H-l 



NjpYl E ( u n+hi,u n +h 2 ) 



n=0 h 1 ,li2=0 



f 

^2 E Tfil— /i2 



as iV — * oo. But the assumption implies that this approaches as H — > oo. 
We now use this in the proof of Furstenberg's Theorem: 



□ 



of Theorem 15.81 Without loss of generality, we assume that E(/ | Z\) = 
and we show that the double averages converges to 0. Set u n = T n fi -T 2n f 2 - Then 

(un, «n+fc) = / T n h ■ T 2n f 2 ■ T n+h Ti ■ T 2n + 2h hd^ 

= f {h-T h Ti)-T n (h-T 2h T 2 )dn . 



By the Ergodic Theorem, 

exists and is equal to 
(5.3) 



N-l 



N— *oo iV — 



n=0 



h-T h h-W{f 2 -T 2h f 2 )dti, 

where P is projection onto the T- invariant functions of L 2 (/i). Since T is ergodic, P 
is projection onto the constant functions. But since E(/i | Z\) = 0, / is orthogonal 
to the constant functions and so the integral in (|5.3|l is 0. The van der Corput 
Lemma immediately gives the result. □ 

Furstenberg used a similar argument combined with induction to show that in 
a weak mixing system, the average |5 . II converges to the product of the integrals in 
L 2 (/i) for all k > 1. 
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Finally, to show that a set of integers with positive upper density contains 
arithmetic progressions of length three (Roth's Theorem), by Furstenberg's Corre- 
spondence Principle it suffices to show double recurrence: 

Theorem 5.10 (Theorem ED for k = 2). Let (X,X,fi,T) be an ergodic system, 
and let A £ X with fi(A) > 0. There exists n € N with 

fi(A n T~ n A n T- 2n A) > . 
Proof. Let / = 1a- Then 

fi(A n T~ n A n T- 2n A) = J / ■ T n f ■ T 2n f dfi . 

It suffices to show that 

JV-l 



J im JjY, f f-T n f-T 2n fd^ 

N^oo N z — ' / 
n=0 J 



is positive. 

By Theorem 15.81 the limiting behavior of the double average jj^2n=oT n f ■ 
T 2n f is unchanged if / is replaced by E(/ | Z\). Multiplying by / and integrating, 
it thus suffices to show that 



JV-l 



(5.4) ^ ^ E / / ' T " E (/ I Z i) ' T2nE (f I Z d » 

n=0 

is positive. Since Z\ is T- invariant, T n E(/ | Z\) ■ T 2 "E(/ | Z\) is measurable with 
respect to Z\ and so we can replace (15 .4|) by 

This means that we can assume that the first term is also measurable with respect 
to the Kronecker factor, and so we can assume that / is a nonnegative function that 
is measurable with respect to the Kronecker. Thus the system X can be assumed 
to be Z\ and the transformation T is rotation by some irrational a. Thus it suffices 
to show that 

JV-l 



lim — / / (s) • / (s + na) ■ f (s + 2na) dm(s) 

n=0 J z i 

is positive. Since {na} is equidistributed in Z, this limit approaches 



(5.5) // /(s) • f(s + t) ■ f(s + 2t)dm{s)dm(t) 

IZ 1 xZi 



But 



t->0 



lim / f(s)- f(s + t)- f(s + 2t)dm(s)= f(s) 3 dm(s), 



which is clearly positive. In particular, the double integral in 15.5(1 is positive. □ 

In the proof we have actually proven a stronger statement than needed to obtain 
Roth's Theorem: we have shown the existence of the limit of the double average 
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in L 2 (fi). Letting / = E(/ | Z\) for / 6 L°°(fi), we have show that the double 
average l|5.2|l converges to 

/ h (tti (a?) + s) ■ / 2 (tti (as) + 2s) dm{s) . 

More generally, the same sort of argument can be used to show that in a weak 
mixing system, the Kronecker factor is characteristic for the averages l|3.1|l for all 
k > 1, meaning that to prove convergence of these average in a weak mixing system 
it suffices to assume that the system is a Kronecker system. Using Fourier analysis, 
one then gets convergence of the averages H3.1|) for weak mixing systems. 

5.8. Multiple averages. We want to carry out similar analysis for the mul- 
tiple averages 

N-l 

J- T n f x ■ T 2n f 2 T kn f k 

71=0 

and show the existence of the limit in L 2 (n) as N — > oo. In his proof of Szemeredi's 
Theorem in and subsequent proofs of Szemeredi's Theorem via ergodic theory 
such as |21| . the approach of Section ISTI is not the one used for k > 3. Namely, 
they do not show the existence of the limit and then analyze the limit itself to show 
it is positive. A weaker statement is proved, only giving that the liminf of 12.11 is 
positive. We will not discuss the intricate structure theorem and induction needed 
to prove this. 

Already for convergence for k — 3, one needs to consider more than just rota- 
tional behavior. 

Example 5.11. Given a system (X, T), let F{Tx) — f(x)F{x) , where 
f(Tx) = A/(:r) and |A| = 1 . 

Then 

F(T n x) = f(x)f(Tx) . . . f{T n - 1 x)F{x) = (f(x)) n F(x) 

and so 

F{x) = (F(T n x)) 3 (F(T 2n x)y 3 F{T 3n x) . 
This means that there is some relation among 

(x,T n x,T 2n x,T 3n x) 
that not arising from the Kronecker factor. 

One can construct more complicated examples (see Furstenberg 18 ) that show 
that even such generalized eigenfunctions do not suffice for determining the lim- 
iting behavior for k = 3. More precisely, the factor corresponding to generalized 
eigenfunctions (the Abramov factor) is not characteristic for the average 13. II with 
k = 3. 

To understand the triple average, one needs to take into account systems more 
complicated than such Kronecker and Abramov systems. The simplest such exam- 
ple is a 2-step nilsystem (the use of this terminology will be clarified later) : 

Example 5.12. Let X = T x T, with Borel er-algebra, and Haar measure. Fix 
a e T and define T: X -> X by 

T(x,y) = (x + a,y + x) 

The system is ergodic if and only if a £ Q. 
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The system is not isomorphic to a group rotation, as can be seen by defining 
f(x,y) = e{y) — exp(27ri?/). Then for all n£Z, 



and so 



n(n — 1) 

T [x, y) = (x + na, y + nx H a) 



f(T n (x,y))=e(y)e(nx)e(^ Y ^-a) 



Quadratic expressions like these do not arise from a rotation on a group. 

6. The structure theorem 

6.1. Major steps in the proof of Theorem 13. 1L In broad terms, there are 
four major steps in the proof of Theorem 13. II 

For each k € N, we inductively define a seminorm ||| • |||fc that controls the asymp- 
totic behavior of the average. More precisely, we show that if |/i| < 1, . . . , |/fc| < 1, 
then 

AT-l 



(6.1) limsup I ^ E T "/i ' T2 "/2 ' • ■ ■ ' T kn h 

n—o 



< min II 

L 2 (p) l<j<k" 



Using these seminorms, we define factors Zk of X such that for / 6 L°°(fi) ) 
E(f | Z k _t) = if and only if |/| fc = . 

It follows from 16. ll that the factor Z^-i is characteristic for the average l|3.1|l . 

The bulk of the work is then to give a "geometric" description of these factors. 
This description is in terms of nilpotent groups, and more precisely we show that the 
dynamics of translation on homogeneous spaces of a nilpotent Lie group determines 
the limiting behavior of these averages. This is the content of the structure theorem, 
explained in Section 16.21 (A more detailed expository version of this is given in 
Host 31 1; for full details, see [31]. I 

Finally, we show convergence for these particular types of systems. 

Roughly speaking, this same outline applies to other convergence results we 
consider in the sequel, such as averages along polynomial times, averages along 
cubes, or averages for commuting transformations. For each average, we find a 
characteristic factor that can be described in geometric terms, allowing us to prove 
convergence in the characteristic factor. 

6.2. The role of nilsystems. We have already seen that the limit behavior of 
the double average is controlled by group rotations, meaning the Kronecker factor 
is characteristic for this average. Furthermore, we have seen that something more 
is needed to control the limit behavior of the triple average. Our goal here is to 
explain how the multiple averages of (|3.1fl . and some more general averages, are 
controlled by nilsystems. We start with some terminology. 

Let G be a group. If g, h E G, let [g, h] = g~ 1 h~ 1 gh denote the commutator 
of g and h. If A, B C G, we write [A, B] for the subgroup of G spanned by 
{ [a, b] : a G A, b e B}. The lower central series 

G = G 1 dG 2 D---dG j d G j+1 D ... 

of G is defined inductively, setting G\ = G and Gj + \ = [G, Gj] for j > 1. We say 
that G is k-step nilpotent if Gk+i = {!}■ 
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If G is a fc-step nilpotent Lie group and T is a discrete cocompact subgroup, 
the compact manifold X = G/T is k-step nilmanifold. 

The group G acts naturally on X by left translation: if a e G and x G X, 
the translation T a by a is given by T a (a;r) = (ax)T. There is a unique Borel 
probability measure \i (the Haar measure) on X that is invariant under this action 
Fixing an element a £ G, the system (G/T, Q /T, T a , fi) is a k-step nilsystem and T a 
is a nilrotation. 

The system (X, ji, T) is an inverse limit of a sequence of factors { (Xj , Xj , fij , T) } 
if is an increasing sequence of T-invariant sub-cr-algebras such that VjeN = 

X up to null sets. If each system (Xj , Xj , fij , T) is isomorphic to a fc-step nilsystem, 
then (X, X , fi,T) is an inverse limit of k-step nilsystems. 

Proving convergence of the averages is only possible if one can has a good 
description of some characteristic factor for these averages. This is the content of 
the structure theorem: 

Theorem 6.1 (Host and Kra 35 ). There exists a characteristic factor for the 
averages which is isomorphic to an inverse limit of (k — \)-step nilsystems. 

6.3. Examples of nilsystems. We give two examples of nilsystems that il- 
lustrate their general properties. 

Example 6.2. Let G = Z x T x T with multiplication given by 

(k,x, y) * (k',x',y') = (k + k',x + x' (mod 1), y + y' + 2kx' (modi)). 

The commutator subgroup of G is {0} x {0} x T, and so G is 2-step nilpotent. The 
subgroup F = Z x {0} x {0} is discrete and cocompact, and thus X — G/T is a 
nilmanifold. Let X denote the Borel cr-algebra and let fi denote Haar measure on 
X. Fix some irrational a € T, let a = (1, a, a), and let T : X — > X be translation 
by a. Then (X, //, T) is a 2-step nilsystem. 

The Kronecker factor of X is T with rotation by a. Identifying X with T 2 via 
the map (k,x,y) i— ► (x,y), the transformation T takes on the familiar form of a 
skew transformation: 

T(x, y) = (x + a, y + 2x + a) . 

This system is ergodic if and only if a ^ Q: for x, y € X and n € Z, 

T n (x, y) = (x + not, y + 2nx + n 2 a) 

and equidistribution of the sequence {T n (x,y)} is equivalent to ergodicity. 

Example 6.3. Let G be the Heisenberg group Ixlxl with multiplication 
given by 

(x, y, z) * (x, y', z) = (x + x , y + y , z + z + xy) . 

Then G is a 2-step nilpotent Lie group. The subgroup T = ZxZxZis discrete 
and cocompact and so X = G/T is a nilmanifold. Letting T be the translation by 
a = (01,02,03) £ G where 01,02 are independent over Q and 03 G R, and taking 
X to be the Borel cr-algebra and ijl the Haar measure, we have that (X, X, /1, T) is 
a nilsystem. The system is ergodic if and only if a\, 02 are independent over Q 

The compact abelian group G/G2T is isomorphic to T 2 and the rotation on T 2 
by (ai, 02) is ergodic. The Kronecker factor of X is the factor induced by functions 
on xi, X2- 

The system (X, X , fi,T) is (uniquely) ergodic. 
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The dynamics of the first example gives rise to quadratic sequences, such as 
{n 2 a}, and the dynamics of the second example gives rise to generalized quadratic 
sequences such as {[naJn/3}. 

6.4. Motivation for nilpotent groups. The content of the Structure The- 
orem is that nilpotent groups, or more precisely the dynamics of a translation on 
the homogeneous space of a nilpotent Lie group, control the limiting behavior of 
the averages along arithmetic progressions. We give some motivation as to why 
nilpotent groups arise. 

If G is an abelian group, then 

{(9,gz,gz 2 ,...,gz n ): g,z£G} 

is a subgroup of G". However, this does not hold if G is not abelian. To make 
these arithmetic progressions into a group, one must take into account the commu- 
tators. This is the content of the following theorem, proven in different contexts by 
Hall jini, Petresco jH], Lazard [42], Leibman [43] : 

Theorem 6.4. If G is a group, then for any x, y £ G, there exist z £ G and 
Wi £ Gi such that 

{x,x 2 ,x*,...,x n ) x (y,y 2 ,y 3 ,...,y") = 

(z, Z 2 W\, Z 3 wllU2, 

Furthermore, these expressions form a group. 

If G is a group, a geometric progression is a sequence of the form 

{g,gz,gz 2 wi,gz 3 wlw 2 , . . .,gz^)w[^ . ..w„-i) > 

where g, z £ G and Wi £ Gi. 

Thus if G is abelian, g and z determine the whole sequence. On the other hand, 
if G is fc-step nilpotent with k < n, the first fc terms determine the whole sequence. 

Similarly, if (G/T, G/T, /i, T a ) is a fc-step nilsystem and 

x\ = giT, x 2 = ,g 2 r, ...,x k = g k T, ...,x n = g n T 

is a geometric progression, then the first k terms determine the rest. Thus a k+1 xT 
is a function of the first k terms axT, a 2 xT, . . . , a k xT. 

This means that the (jfe + l)-st term T (fc+1 ) Tl x in an arithmetic progression 
T n x, . . . , T kn x is constrained by first k terms. More interestingly, the converse 
also holds: in an arbitrary system (X, X , any fc-step nilpotent factor places 

a constraint on (x, T n x, T 2n x, . . . , T kn x). 

7. Building characteristic factors 

The material in this and the next section is based on |34j and the reader is re- 
ferred to 134} for full proofs. To describe characteristic factors for the averages H3.ll , 
for each fc £ N we define a seminorm and use it to define these factors. We start by 
defining certain measures that are then used to define the seminorms. Throughout 
this section, we assume that (X, X, fx, T) is an ergodic system. 
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7.1. Definition of the measures. Let = X 2 " and define : iH -> 
jSfM by TW = T x • • • x T (taken 2 fc times). 

We write a point x e as x = (x £ : e € {0, l} fc ) and make the natural 
identification of X^ k+1 ^ with x I^, writing x = (x',x") for a point of X^ k+1 \ 

with x',x" e l' fc l. 

By induction, we define a measure on X^ invariant under T^. Set 
:= (j,. Let be the invariant cr-algebra of (X^ , , T^) . (Note that 
this system is not necessarily ergodic.) Then fj\ k+1 ^ is defined to be the relatively in- 
dependent joining of /v- k ' with itself over 

j[fc] 

, meaning that if _F and G are bounded 

functions on X^ k \ 
(7.1) 

F(x') • G(x")^ [ ' £+11 (x) = / E(F | jW)(y) • E(G | iM ) (y) d/^ (y) . 

Since (X, A", fj,, T) is assumed to be ergodic, is trivial and /xM = fix/i. If the 
system is weak mixing, then for all k > 1, /i' fc l is the product measure fix fj, x . . . x //, 
taken 2 k times. 

7.2. Symmetries of the measures. Writing a point x 6 X^ as 

x=(i e :ee {0, l} fe ) , 

we identify the indexing set {0, l} k of this point with the vertices of the Euclidean 
cube. 

An isometry a of {0, l} k induces a map <r* : X^ — > Xl fc l by permuting the 
coordinates: 

(o-»( x ))e = x <r{e) ■ 

For example, from the diagonal symmetries for k — 2, we have the permutations 
{xqo,xqx,x w ,xxi) h-> (a;oo,a;iQ,xoi,a;H) 

(^oo> ^oij ^10) ^n) l— * (^li) ^oij ^10) ^oo) • 
By induction, the measures are invariant under permutations: 

Lemma 7.1. For each k G N, ifte measure /i^l is invariant under all permuta- 
tions of coordinates arising from isometries of the unit Euclidean cube. 

7.3. Defining seminorms. For each k <E N, we define a seminorm on L°°(//) 
by setting 



A -6{04} fc 

By definition of the measure , this integral is equal to 



/ e( n f(x e )\ii k - 



and so in particular it is nonnegative. 

From the symmetries of the measure fj,^ (Lemma I7.1JI . we have a version of 
the Cauchy-Schwarz inequality for the seminorms, referred to as a Cauchy-Schwarz- 
Gowers inequality: 
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Lemma 7.2. For e e {0, l} k , let f e e L°°(fi). The 



I n fMd^(x)\< n 



eG{0,l} fc ee{0,l} fc 

As a corollary, the map / > |||/|||fe is subadditive and so: 

Corollary 7.3. For every k € N, ||| • |||fc is a seminorm on L°°(fi). 

Since the system (X, A", /i, T) is crgodic, the a-algebra 2^ is trivial, /iW = fix fi 



and 



By induction, 
If llh < III fill, < ••• < 



<---<ll/ll 



If the system is weak mixing, then |||/|||fe = |||/|||i for all k € N. 

By induction and the ergodic theorem, we have a second presentation of these 
seminomas: 



Lemma 7.4. For every k > 1, 



N-l 



n=0 



7.4. Seminorms control the averages 13.11 The seminorms ||| • |||fc control 
the averages along arithmetic progressions: 

Lemma 7.5. Assume that (X,X, fi,T) is ergodic and let k eN. If ||/i||oo>- • • > 
||/fc||oo < 1, then 



lim sup 

N — >oo 



1 

£ T"/i • T 2 "/ 2 ..... T*"A 2 < m in /Ml* 

n=0 



Proof. We proceed by induction on k. For k = 1 this is trivial. Assume it 
holds for fc > 1. Define u„ = T n /i • T 2n f 2 ■ ■ ■ T^+^/fc+i, and assume that £ > 1 
(the case £ = 1 is similar). Then 



N—l 



„ JV-lfc+1 

- £ (Un+H,U n ) =/(/!• TVl)^ £ II ^'-^"(/j • T jh fj) dfl 

^ n m ^, n 2 

AT-lfe+l 



n=0 



n=0 j=2 



n=0 j=2 



By the induction hypothesis, jh < t\fi ■ T fe\\k- Thus 

H-l n IH-l 



h=0 



and the statement follows from the van der Corput Lemma (Lemma I5.9|) and the 
definition of the seminorm ||| • |||fc+i. □ 
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7.5. The Kronecker factor, revisited (k = 2). We have seen two presenta- 
tions of the Kronecker factor [Z\,Z\,m, T): it is the largest abelian group rotation 
factor and it is the sub-c-algebra of X generated by the eigenhmctions. Another 
equivalent formulation is that it is the smallest sub-d-algebra of X such that all 
invariant functions of (X x X, X x X, \i x /i, T x T) are measurable with respect to 
Z\ x Z\. Recall that m : X — > Z\ denotes the factor map. 

We give an explicit description of the measure fj\ 2 \ and thus give yet another 
description of the Kronecker factor. For / e L°°(fi), write / = E(/ | Z\). 

For s e Zi and /o, /i € L°°(/i), we define a probability measure fi s on X x X 

by 

/ fo{xo)fi{xi)dfj, s (x ,x 1 ) := f (z)f 1 (z + s)dm(z) . 

JXxX JZ X 

This measure is T x T-invariant and the ergodic decomposition of fi x fi under TxT 
is given by 

fj, x fj,= fi s dm(s) 

J Zi 



Thus for m-almost every s £ Z\, the system (X x X, X x X, fi s , T x T) is ergodic 
and 

M [21 = / Ms x Ms cfm(s) . 
More generally, if / £ , e <G {0, l} 2 , are measurable functions on X, then 

/ /oo ® /oi ® /io ® /n rfM [2] 

/oo(^) • /oi(^ + s) • /io(z + t) • fu(z + s + t) dm(z) dm(s) dm(t) . 



L 



Izf 

It follows immediately that: 



f®f®f®fdyM 

= / f(z) ■ f(z + s) ■ f(z + t) ■ f(z + s + t) dm(z) dm(s) dm(t) . 
Jzf 

As a corollary, |||/|||2 is the ^ 4 -norm of the Fourier Transform of / and the factor 
Zi, defined by |||/||| 2 = if and only if E(/ | Z x ) = for / e L°°(fj,), is the Kronecker 
factor of (X,X,fi,T). 



7.6. Factors for all k > 1. Using these seminorms, we define factors Zk = 
Zk(X) for fc > 1 of X that generalize the relation between the Kronecker factor 
Z\ and the second seminorm ||| • I2: for / e E(/ | -Efe) = if and only if 

ll/IU+i = 0. To explain this, we start by describing some geometric properties of 
the measures ^ . 

Indexing X^ by the coordinates {0, l} fe of the Euclidean cube, it is natural to 
use geometric terms like side, edge, vertex for subsets of {0, l} fc . For example, the 
following illustrates the point x e X® with the side a = {010, 011, 110, 111}: 
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^011 



£lll 









£001 




£l01 


w 

£Q10 





£110 



£000 



£100 



by: 



Let a C {0, l} fc be a side. The side transformation r| fe ' of X^ is defined by: 
(TWx) e 

We can represent the transformation T a associated to the side {010, 011, 110, 111} 

Txqu Txiu 



Tx t if e S a ; 
x e otherwise 









£001 




£101 


/ Txqio 





£ooo 



£100 



Since permutations of coordinates leave the measure /i^ invariant and act 
transitively on the sides, we have: 

Lemma 7.6. For all k G N, i/ie measure fj\ k ^ is invariant under the side trans- 
formations. 

We now view X^ in a different way, identifying X^ = X x X 2 _1 . A point 
x G is now written as 

x = (x ,x) where x £ A 2 "" 1 , x £ X, and = (00... 0) £ {0, l} fc . 

Although the coordinate has been singled out and seems to play a particular role, 
it follows from the symmetries of the measure /z^ (Lemma I7.1fl that any other 
coordinate could have been used instead. 

If a C {0, l} k is a side that does not contain (there are k such sides), the 
transformation leaves the coordinate invariant. It follows from induction and 
the definition of the measure that: 
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PROPOSITION 7.7. Let fceN. If B C X 2 * -1 , t/iere errasfc AcX wii/i 
(7.2) IaC^o) = /or a/most a// x G X [feI 

i/ and on/y if X x B is invariant under the k transformations arising from the 
k sides a not containing 0. 

This means that the subsets A C X such that there exists B C X^^ 1 sat- 
isfying l|7.2l) form an invariant sub-cr-algebra Z k -\ = Zk-i(X) of X. We define 
Zk-i = Zk-i(X) to be the associated factor. Thus Zk-i(X) is defined to be the 
sub-cr-algebra of sets A C X such that Equation (|7.2|) holds for some set B C X 2 ''^ 1 . 

We give some properties of the factors: 

Proposition 7.8. (1) For every bounded function f on X, 

|||/l| fc = if and only if E(f \ Z h ^) = . 
(2) For bounded functions f t , e € {0, l} k , on X, 




Furthermore, Zk-i is the smallest sub-o -algebra of X with this property. 
(3) The invariant sets of [X\ k \ X^ , fj\ k ^ , yM) are measurable with respect 
to Zj® . Furthermore, Zk is the smallest sub-o -algebra of X with this 
property. 

The proof of this proposition relies on showing a similar formula to that used 
(in Equation (|7.1|) ') to define the measures fj\ k \ but with respect to the new iden- 
tification separating 1 coordinate from the 2 k — 1 others. Namely, for bounded 
functions / on X and F on X 2 _1 , 

/ /(ar ) ' F(x) dn [k] (x) = / E(/ | Z k _ x ) ■ E(F \ Z k _ x ) . 

The given properties then follow using induction and the symmetries of the mea- 
sures. 

We have already seen that Zq is the trivial factor and Z\ is the Kronecker 
factor. More generally, the sequence of factors is increasing: 

Zq <— Z\ <— ■ ■ ■ <— Zk <— Zk+\ X . 

If X is weak mixing, then Zk(X) is the trivial factor for every k. 

An immediate consequence of Lemma 17.51 and the definition of the factors is 
that the factor Zk-\ is characteristic for the average along arithmetic progressions: 

Proposition 7.9. For all k > 1, the factor Zk-\ is characteristic for the 
convergence of the averages 

1 N ' 1 

^ T n h ■ T 2n f 2 T kn f k . 

n=0 
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8. Structure theorem 

8.1. Systems of order k. For k > 0, an ergodic system X is said to be of 
order k if Zk(X) = X. This means that ||| • \\k+i is a norm on L°°{fx). 

Given an ergodic system (X, X, ^,T), Z k {X) is a system of order k, since 
Zk(Zk(X)) = Zk(X). The unique system of order zero is the trivial system, and 
a system of order 1 is an ergodic rotation. By definition, if a system is of order fc, 
then it is also of order kl for any k 1 > k. 

By Proposition to show convergence of 

N-l 

J- T n fi ■ T 2n f 2 T kn f k 
n=0 

on an arbitrary system, it suffices to assume that each function is defined on the fac- 
tor Zk-i- But since Z k -i(X) is a system of order k, it suffices to prove convergence 
of this average for systems of order k — 1. 

In this language, the structure theorem becomes: 

Theorem 8.1 (Host and Kra A system of order k is the inverse limit of 

a sequence of k-step nilsystems. 

Before turning to the proof of the structure theorem, we show convergence for 
the average along arithmetic progressions in a nilsystem. 

8.2. Convergence on a nilmanifold. Using general properties of nilman- 
ifolds (see Furstenberg |15| and Parry |47|L Lesigne |46| showed for connected 
group G and Leibman |44| showed in the general case, convergence in a nilsystem: 

Theorem 8.2. If (X = G/T, G/T, fj,, T) is a nilsystem and f is a continuous 
function on X , then 

N-l 

converges for every i£l. 

(See also Ratner |5U| and Shah I53| for related convergence results.) 
As a corollary, we have convergence in L 2 (/j,) for the average along arithmetic 
progressions in a nilmanifold: 

Corollary 8.3. If(X = G/T 7 Q/T, /x,T) is a nilsystem, k e N, and fx, fa,..., ft & 
L°°(ji), then 

N-l 
n=0 

exists in L (p). 

Proof. By density, we can assume that the functions are continuous. By 
assumption, G k is a nilpotent Lie group, T k is a discrete cocompact subgroup and 
X k = G k /T k is a nilmanifold. Let 

s= (t,t 2 ,...,t k ) e G k 

and let S: X k — > X k be the translation by s, meaning that 

S = T x T 2 x . . . x T k . 
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We apply Theorem 18. 21 to (X k ,S) with the continuous function 

F(xi,x 2) ...,Xk) = fi(xi)f 2 {x 2 )...fk(xk) 

at the point y = (x, x, . . . , x) and so the averages converge everywhere. □ 

Thus Theorem 13.11 holds in a nilsystem, and we are left with proving the Struc- 
ture Theorem. 

8.3. A group of transformations. To each ergodic system, we associate a 
group of measure preserving transformations. The general approach is to show that 
for sufficiently many systems of order k, this group is a nilpotent Lie group. The 
bulk of the work is to then show that this group acts transitively on the system. 
Thus the system can be given the structure of a nilmanifold and the Structure 
Theorem follows. 

Most proofs are sketched or omitted completely, and the reader is referred 
to | 35 | for the details. 

Let (X,X,n,T) be an ergodic system. If S: X — > X and a C {0,l} fc , define 
S [ * ] :XW^ XW by: 

Sx e if e G a; 
x e otherwise . 

Let Q = G(X) be the group of transformations S: X — > X such that for all 
fceN and all sides a c {0, l} k , the measure /^ fe J is invariant under Sa\ 

Some properties of this group are immediate. By symmetry, it suffices to con- 
sider one side. By definition, T G G, and if ST = TS then we also have that S G Q. 
If S e G and k G N, then ^ is invariant under : — > X^. Furthermore, 
S^E = E for every E G l [k] . 

By induction, the invariance of the measure under the side transformations, 
and commutator relations, we have: 

PROPOSITION 8.4. If X is a system of order k, then G{X) is a k-step nilpotent 
group. 

8.4. Proof of the structure theorem. We proceed by induction. By the 
inductive assumption, we can assume that we are given a system (X, X , /i, T) of 
order k. We have a factor (Y, y, v, T), where Y = Zk-i(X) and it: X — > Y is 
the factor map. Furthermore, Y is an inverse limit of a sequence of (k — l)-step 
nilsystems 

Y = l\mY l ■ Y^G./Y, . 

We want to show that X is an inverse limit of /c-step nilsystems. 

We have already shown that if f e , e G {0,l} fc , are bounded functions on X, 
then 

f n /e(^)^ w (x)=/ n E(/ e |y)(x e )V fel (x) 

££{0,l} fc ee{0,l} fe 

In particular, for / G L°°(/i), 

|||/lU=0if and onlyifE(/|y)=0. 

Furthermore, X does not admit a strict sub-cr-algebra Z such that all invariant sets 
of (X^ , /iM , T[ fe l) are measurable with respect to Z^. Recall also that the system 
(JcW, M W,TW) is defined as a relatively independent joining. 
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In (16) . Furstenberg described the invariant cr-algebra for relatively independent 
joinings. It follows that X is an isometric extension of Y, meaning that X = 
Y x H/K where H is a compact group and K is a closed subgroup, p = v x m, 
where m is the Haar measure of H/K, and the transformation T is given by 



for some map p: Y — > H. 

Lemma 8.5. For every h € H, the transformation (y,u) i— ► (y,h- u) of X 
belongs to the center ofQ(X). 

Thus if is abelian. We can substitute H/K for iJ, and we use additive notation 



We therefore have more information: X is an abelian extension of Y, meaning 
that X = Y x H for some compact abelian group H, p, = v x m, where m is the 
Haar measure of H, and the transformation T is given by T(y, u) = (Ty, u + p(y)) 
for some map p: Y —* H. We call p the cocycle defining the extension. 

Furthermore, we show that the cocycle defining this extension has a particular 
form: 

PROPOSITION 8.6 (The functional equation). If (X,X,p,T) is a system of 
order k and (Y,y ,v,T) = Zj~—i(X), then X is an abelian extension of Y via a 
compact group H and for the cocycle p defining this extension, there exists a map 
$ : yM ^ h such that 



for v^-a.e. y € Y^ . 

We can make a few more assumptions on our system. Namely, by induction 
we can deduce that H is connected. Since every connected compact abelian group 
H is an inverse limit of a sequence of tori, we can further reduce to the case that 
H = T d . 

8.5. The case k = 2 (The Conze-Lesigne Equation). We maintain nota- 
tion of the preceding section and review what this means for the case k = 2. By 
assumption, we have that (Y, y, v, T) is a system of order 1, meaning it is a group 
rotation. The measure is the Haar measure of the subgroup 

{(y,y + s,V + t,y + s + t): y,s,t e Y) 

of F 4 . The functional equation of Proposition l8.6l is: there exists $ : Y 3 — > T d with 

P(y) - P(y + s) - p(y + t) + p(y + s + 1) = <P(y + 1, s, t) - s, t) 

It follows that for every s £ Y, there exists (f> s : Y — > T d and c s £ T d satisfying 
the Conze-Lesigne Equation (see |5]): 

((CL)) p(y) - p(y + s)= ^ s {y + 1) - 4> s (y) + c s . 

The group G{X) associated to the system is the group of transformations of 
X = Y x T d of the form 



T(y,u) = (Ty,p(y) ■ u) 



for H. 



(8.1) 




e6{0,l} fe 



(y,h) i-> (y + s,h + (j) s (y)) 



where s and 4> s satisfy (CL) 
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8.6. Structure theorem in general. We give a short outline of the steps 
needed to complete the proof of the Structure Theorem for k > 3. We have that 

Y = Z k -i(X) is a system of order k - 1, X = Y x T d , T(y,h) = (Ty,h + p(y)), 
and p : Y — > T d satisfies the functional equation l|8.1|l . By the induction hypothesis 

Y = limYj where each Yj, = Gi/Ti is a (k — l)-step nilsystem. 

We first show that the cocycle p is cohomologous to a cocycle measurable with 
respect to 3^ for some i, meaning that the difference between the two cocycles is 
a coboundary. This reduces us to the case that p is measurable with respect to 
some ~y>i, and so we can assume that Y — for some i. Thus Y is a (fc — l)-step 
nilsystem and we can assume that Y = G/T with G = G(Y). 

We then use the functional equation to lift every transformation S G G to a 
transformation of X belonging to G(X). Starting with the case S € Gk-\, we move 
up the lower central series of G. Lastly we show that we obtain sufficiently many 
elements of the group Q(X) in this way. 

8.7. Relations to the finite case. The seminorms ||| • play the same role 
that the Gowers norms play in Gowers's proof |23j of Szemeredi's Theorem and 
in Green and Tao's proof |25| that the primes contain arbitrarily long arithmetic 
progressions. We let Uk denote the k-th. Gowers norm. For the finite system Z/iVZ, 
ll/IIU = H/lltV Furthermore, \\-\\u k is a norm, not only a seminorm. The analog of 
Lemma l7~Kl is that if ||/o||oo, H/illoo, ■ ■ ■ , ||/fc||oo < 1, then there exists some constant 
Ck > such that 

|E(/ (z)/i(z + y)... f k (x + ky)\x,y€ Z/pZ) | < C k min \\f 3 \\ Uk . 

Other parts of the program are not as easy to translate to the finite setting. 
Consider defining a factor of the system using the seminorms. If p is prime, then 
Z/pZ has no nontrivial factor and so there is no factor of Z/pZ playing the role of 
the factor Zk, meaning there is no factor with 

E(/|Z fc ) = 0ifand only i{\\f\\ Uk =0. 

Instead, the corresponding results have a different flavor: if ||/||c/ fc is large in some 
sense, then / has large conditional expectation on some (noninvariant) er-algebra 
or it has large correlation with a function of some particular class. Although we 
have a complete characterization of the seminorms ||| • \\k (and so the factors Zj.) in 
terms of nilmanifolds, there are only partial combinatorial characterizations in this 
direction (see [26 ] . [27] and |28]). 

9. Other patterns 

9.1. Commuting transformations. Ergodic theory has been used to detect 
other patterns that occur in sets of positive upper density, using Furstenberg's 
Correspondence Principle and an appropriately chosen strengthening of Furstenberg 
multiple recurrence. A first example is for commuting transformations: 

Theorem 9.1 (Furstenberg and Katznelson |19|). Let (X, X, p) be a probability 
measure space, let k > 1 be an integer, and assume that Tj : X — > X are commuting 
measure preserving transformations for j = 1,2, ... ,k, then for all A G X with 
p(A) > 0, there exist infinitely many ngN such that 

(9.1) p(A n Tf n A n T^ n A n . . . n T^ n A) > o . 
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(In |20j . Furstenberg and Katznelson proved a strengthening of this result, 
showing that one can place some restrictions on the choice of n; we do not discuss 
these "IP" versions of the theorems given in the sequel.) Via correspondence, a 
multidimensional version of Szemeredi's Theorem follows: if E c Z r has positive 
upper density and F C Z r is a finite subset, then there exist z E II and n E N 
such that z + nF C E. 

Again, this theorem is proven by showing that the associated liminf of the 
average of the quantity in Equation (|9.1|) is positive. Again, it is natural to ask if 
the limit 

N-l 

lim — u(A n T7 n A n . . . n T- n A) 

exists in L 2 (/i) for commuting maps Tx, . . . ,T^. Only partial results are known. 
For k = 2, Conze and Lesigne ([8], |9]) proved convergence. For k > 3, the only 
known results rely on strong hypotheses of ergodicity: 

Theorem 9.2 (Frantzikinakis and Kra |13p. Let k E N and assume that 
Ti,T2, . . . , Tfc are commuting invertible ergodic measure preserving transformations 
of a measure space (X, X, fi) such that T{T~ X is ergodic for all i,j E {1, 2, . . . , k} 
with iy^j. If fx, / 2 , . . • , fk G L°°((j,) the averages, 

1 JV_1 

- T ih ■ T$h ■■■■■ nh 

n=0 

converge in L 2 ([i) as N — > oo. 

The idea is to prove an analog of Lemma |7. 51 for commuting transformations, 
thus reducing the problem to working in a nilsystem. The factors that are char- 
acteristic for averages along arithmetic progressions are also characteristic for these 
particular averages of commuting transformations. Without the strong hypotheses 
of ergodicity, this no longer holds and the general case remains open. 

9.2. Averages along cubes. Another type of average is along fc-dimensional 
cubes, the natural objects that arise in the definition of the seminorms. For exam- 
ple, a 2-dimcnsional cube is an expression of the form: 

f(x)f(T m x)f(T n x)f(T m+n x) . 

In 4 , Bergelson showed the existence in L 2 (fi) of 

i 

lim y T n fx ■ T m f 2 ■ T n+m h , 

n,m— 

where /i, /b, fz € Similarly, one can define a 3-dimensional cube: 

/i(T m 2:)/2(T^)/3(T m+ ™ 2 :)/4(TP a ;)/5(T m +^)/ 6 (T"+^)/ 7 (T m+ " + f a ;) 

and existence of the limit of the average of this expression L 2 (n) for bounded 
functions fx, fi, ■ ■ ■ , h was shown in |33j . 

More generally, this theorem holds for cubes of 2 k — 1 functions. Recalling the 
notation of Section[7| we have for e = ex ■ ■ ■ tk & {0, l} fc and n = (m, . . . , rik) E Z fc , 

e • n = exnx + e 2 n 2 H h e k n k , 

and denotes the element 00 ... of {0, l} fe . We have: 
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Theorem 9.3 (Host and Kra p2|). Let (X,X,fi,T) be a system, let k > 1 be 
an integer, and let f e , e £ {0, l} fc \ {0}, be 2 k — 1 bounded functions on X . Then 
the averages 

j\rfc ^2 II Tn /e 

ne[0,iV-l]* ee{0.1} k 

converge in L 2 {^i) as N — > oo. 

The same result holds for translated averages, meaning the average for n G 
[Mi, JVi] x • • • x [Mfc.Mk], as JVi - M u . . . , N k - M k -»• oo. 

By Furstenberg's Correspondence Principle, this translates to a combinatorial 
statement. A subset E C Z is syndetic if Z can be covered by finitely many 
translates of -E. In other words, there exists N > such that every interval of size 
N contains at least one element of E. (Thus it is natural to refer to a syndetic set 
in the integers as a set with bounded gaps.) More generally, E C Z* is syndetic if 
there exists an integer N > such that 

E n ([Mi, Mi +N]x ...x [M k ,M k + N]) ^ 

for all Mi, . . . ,M fe e Z. 

Restricting Theorem 19.31 to indicator functions, the limit of the averages 

■ £ "< n r-M) 

»'=! nie[Afi,AT 1 ],...,n fc G[M fc ,7V fc ] eG {o,l} fc 

exists and is greater than or equal to n(A) 2 when N± — Mi, . . . , iV^ — Mfe — > oo. 
Thus for every e > 0, 

{n e Z fc : M ( p| T e n A) > ^yL) 2 " - e} 

e€{0,l} k 

of Z fe is syndetic. 

By the Correspondence Principle, we have that if E C Z has upper density 
d* (.E) > S > and k G N, then 

{nGZ fe :d*( p| (S + e-n)) >,5 2fc } 

£G{0,l} fc 

is syndetic. 

9.3. Polynomial patterns. In a different direction, one can restrict the it- 
erates arising in Furstenberg's multiple recurrence. A natural choice is polynomial 
iterates, and the corresponding combinatorial statement is that a set of integers 
with positive upper density contains elements who differ by a polynomial: 

Theorem 9.4 (Sarkozy |51| . Furstenberg |17p. If E C N has positive upper 
density and p: Z — > Z is a polynomial with p(Q) — 0, then there exist x,y € E and 
n£N suc/i that x — y = p(n). 

As for arithmetic progressions, Furstenberg's proof relies on an averaging the- 
orem: 
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Theorem 9.5 (Furstenberg [E])- Let (X, X, fj,, T) be a system, let A e X with 
/j.(A) > and let p : Z — ► Z be a polynomial with p(0) = 0. Then 



N-l 



iminf i V n(A n T- p(n) A) > 



lim.„. 

n=0 

The multiple polynomial recurrence theorem, simultaneously generalizing this 
single polynomial result and Furstenberg's multiple recurrence, was proven by 
Bergelson and Leibman: 

Theorem 9.6 (Bergelson and Leibman [6]). Let (X,X,/x,T) be a system, let 
A e X with /j,(A) > 0, and let k S N. If Pi,P2, ■ ■ ■ ,Pk '■ ^ ~~ * ^ are polynomials with 
Pj(0) = for j = 1, . . . , k, then 



N-l 

(9.2) liminf — K A n fl • • • R r p *Wvl) > . 



n=0 



By the Correspondence Principle, one immediately deduces a polynomial Sze- 
meredi Theorem: if E C Z has positive upper density, then it contains arbitrary 
polynomial patterns, meaning there exists n € N such that 

x, x + px(n), x + p 2 (n), . . . , x + p k (n) e E . 

(More generally, Bergelson and Leibman proved a version of Theorem 19 . 61 for com- 
muting transformations, with a multidimensional polynomial Szemeredi Theorem 
as a corollary.) 

Again, it is natural to ask if the liminf in (|9.2jl is actually a limit. A first result 
in this direction was given by Furstenberg and Weiss (22) . who proved convergence 
in L 2 ([i) of 

N-l 



N 

n=0 



and 



N-l 

L X 2 2 1 

\ rrin P rrin +n £ 



N 

n=0 

for bounded functions fi,fi- 

The proof of convergence for general polynomial averages uses the technology of 
the seminomas, reducing to the same characteristic factors Zk that can be described 
using nilsystems, as for averages along arithmetic progressions: 

Theorem 9.7 (Host and Kra 35 , Leibman [ISj)- Let (X, X, pi, T) be a system, 
k € N , and f\, f 2 , ■ ■ ■ , fu G L°° (/1) . Then for any polynomials pi , p% , . . . , p^ '■ Z — ► Z, 
the averages 

1 W_1 



TV 

n=0 



converge in L 2 (/i). 



Recently, Johnson |36| has shown that under similar strong ergodicity condi- 
tions to those in Theorem 19. 21 one can generalize this and prove L 2 (^)-convergence 
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of the polynomial averages for commuting transformations: 

2^ J l /l ' J 2 H ■ ■ ■ ■ ■ 1 k Ik 

71=0 

For a totally ergodic system (meaning that T n is ergodic for all n G N) , Fursten- 
berg and Weiss showed a stronger result, giving an explicit and simple formula for 
the limit: 

JV-l 

/ 



■ 1 Y,T n h-T n *h^ [fid»-[, 

n=0 ' 



in L 2 {n). 



Bergelson [3] asked whether the same result holds for k polynomials of different 
degrees, meaning that the limit of the polynomial average for a totally ergodic 
system is the product integrals. We show that the answer is yes under a more 
general condition. A family of polynomials Pi,P2> ■ ■ ■ >Pk '■ Z — ► Z is rationally 
independent if for all integers mi , . . . , with at least some rrij ^ 0, the polynomial 
Sj=i m jPj{ n ) is not constant. We show: 

Theorem 9.8 (Frantzikinakis and Kra [121) Let (X,X,fi,T) be a totally er- 
godic system, let k > 1 be an integer, and assume that pi , p% . . . , pi- : Z — > Z are 
rationally independent polynomials. If fx, /g, . . . , fk G L°°(fi), 



lim 

JV— >oo 



1 JV-l fe „ 

^ T Pi(n) fi . TP2 (n) . _ _ . TPk (n) fk _ / f . ^ ^ = Q 

n=0 i=l ^ 



As a corollary, if (A, A", /it, T) is totally ergodic, . . . ,Pfe} are rationally in- 

dependent polynomials taking on integer values on the integers, and j4o, j4i, . . . , A% G 
X with /j(Aj) > 0, i = 0, . . . , k, then 

fi(A n r-^Wiit n . . . n T- pfc(n) ^ fc ) > o 

for some n € N. Thus in a totally ergodic system, one can strengthen Bergelson 
and Leibman's multiple polynomial recurrence theorem, allowing the sets Ai to be 
distinct, and allowing the polynomials pi to have nonzero constant term. It is not 
clear if this has a combinatorial interpretation. 



10. Strengthening Poincare recurrence 

10.1. Khintchine recurrence. Poincare recurrence states that a set of pos- 
itive measure returns to intersect itself infinitely often. One way to strengthen this 
is to ask that the set return to itself often with 'large' intersection. Khintchine 
made this notion precise, showing that large self intersection occurs on a syndetic 
set: 

Theorem 10.1 (Khintchine PS]). Let (X,X,fi,T) be a system, let A G X have 
l-i(A) > 0, and let e > 0. Then 

{n G Z : n(A n T n A) > /.i(A) 2 - e} 

is syndetic. 
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It is natural to ask for a simultaneous generalization of Furstenberg Multiple 
Recurrence and Khintchine Recurrence. More precisely, if (X, X, /i, T) is a system, 
A E X has positive measure, k S N, and e > 0, is the set 

{neZ: fi(A n TM n • • • H T kn A) > [i(A) k+1 - e} 

syndetic? 

Furstenberg Multiple Recurrence implies that there exists some constant c = 
c(/u(j4)) > such that 

{neZ: p(4 n r™A n . . . n T fc "A) > c} 

is syndetic. But to generalize Khintchine Recurrence, one needs c = fi(A) k+1 . It 
turns out that the answer depends on the length k of the arithmetic progression. 

Theorem 10.2 (Bergelson, Host and Kra |5]). Let (X,X,fj,,T) be an ergodic 
system and let A G X . Then for every e > 0, the sets 

{n e Z : fi(A n T"A n T 2n A) > n(Af - e} 

and 

(rneZ: fi(A n T"A n T 2 "A n T 3n A) > ^{Af - e} 

are syndetic. 

Furthermore, this result fails on average, meaning that the average of the left 
hand side expressions is not greater than n(A) 3 — e or /i(A) 4 — e, respectively. 

On the other hand, based on an example of Ruzsa contained in the appendix 
of [5], we have: 

Theorem 10.3 (Bergelson, Host and Kra 5 ). There exists an ergodic system 
(X, X , n 7 T) and for all t € N there exists a set A = A(£) £ X with fi(A) > such 
that 

n{A n T"A n T 2n A n T 3n A n T 4n A) < n{Af/2 
for every integer n =/= 0. 

We now briefly outline the major ingredients in the proofs of these theorems. 

10.2. Positive ergodic results. We start with the ergodic results needed to 
prove Theorem 110.21 Fix an integer k > 1, an ergodic system (X, X , /j,,T), and 
A G X with /u,(A) > 0. The key ingredient is the study of the multicorrelation 
sequence 

fi(A n T n A n T 2n A n . . . n T kn A) . 

More generally, for a real valued function / e we consider the multicorre- 

lation sequence 

I f (k,n) := J f-T n f-T 2n f-...-T kn fdfi(x) . 

When k = 1, Herglotz's Theorem implies that the correlation sequence 7/(1, n) 
is the Fourier transform of some positive measure o~ = at on the torus T: 

I f (l,n) = a(n) := / e 2mt da{t) . 
Jt 

Decomposing the measure a into its continuous part a c and its discrete part a d , 
can write the multicorrelation sequence 1/(1, n) as the sum of two sequences 

7/(1,71) = a^(n) + a d {n) . 
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The sequence {o~°(n)} tends to in density, meaning that 

M+N-l 

N^oc M £Z M 



lim sup V |cr c (n)| = 



-M 

Equivalently, for any e > 0, the upper Banach density 3 of the set {n £ Z : |er c (n)| > 
s} is zero. The sequence {o~ d (n)} is almost periodic, meaning that there exists a 
compact abelian group G, a continuous real valued function <fi on G, and a e G 
such that <r d (n) = <fi(a n ) for all n. 

A compact abelian group can be approximated by a compact abelian Lie group. 
Thus any almost periodic sequence can be uniformly approximated by an almost 
periodic sequence arising from a compact abelian Lie group. 

In general, however, for higher k the answer is more complicated. We find a 
similar decomposition for the multicorrelation sequences J/(fc,n) for k > 2. The 
notion of an almost periodic sequence is replaced by that of a nilsequence: for an 
integer k > 2, a fc-step nilmanifold X = G/Y, a continuous real (or complex) valued 
function (j) on G, a G G, and e£X, the sequence {4>(a n ■ e)} is called a frasic k-step 
nilsequence. A k-step nilsequence is a uniform limit of basic /c-step nilsequences. 

It follows that a 1-step nilsequence is the same as an almost periodic sequence. 
An inverse limit of compact abelian Lie groups is a compact group. However an 
inverse limit of A:-step nilmanifolds is not, in general, the homogeneous space of 
some locally compact group, and so for higher k, the decomposition result must 
take into account the uniform limits of basic nilsequences. We have: 

Theorem 10.4 (Bergelson, Host and Kra |5]). Let (X,X,fj,,T) be an ergodic 
system, f G L°°(/i) and k > 1 an integer. The sequence {If(k,n)} is the sum of a 
sequence tending to zero in density and a k-step nilsequence. 

Finally, we explain how this result can be used to prove Theorem 110.21 Let 
{flnjnez be a bounded sequence of real numbers. The syndetic supremum of this 
sequence is defined to be 



sup|c G R: {n G Z: a n > c} is syndetic | 



Every nilsequence {a,,} is uniformly recurrent. In particular, if S = sup(et„) and 
e > 0, then {n G Z: a n > S — e} is syndetic. 

If {a n } and {b n } are two sequences of real numbers such that a n — b n tends to 
in density, then the two sequences have the same syndetic supremum. Therefore 
the syndetic supremums of the sequences 

{n(AnT n AnT 2n A)} 

and 

{fj,(A n T n A n T 2n A n T 3n A)} 

are equal to the supremum of the associated nilsequences, and we are reduced to 
showing that they are greater than or equal to n(A) 3 and fi(A) 4 , respectively. 



The upper Banach density d(E) of a set E C Z is defined by d(e) = limjv— >oo Emez T?^^ 1 
[Af.Af + iV-l]|. 
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10.3. Nonergodic counterexample. Ergodicity is not needed for Khint- 
chine's Theorem, but is essential for Theorem 110.21 

Theorem 10.5 (Bergelson, Host, and Kra [5]). There exists a (nonergodic) 
system (X,X, fi,T), and for every f e N there exists A S X with /i(A) > such 
that 

fi(AnT n AnT 2n A) < \^{A) 1 . 

for integer )i^0. 

Actually there exists a set A of arbitrarily small positive measure with 
fj,(A n T n A n T 2n A) < m (A)- c1 °k(M4)) 

for every integer and for some positive universal constant c. 

The proof is based on Behrend's construction of a set containing no arithmetic 
progression of length 3: 

Theorem 10.6 (Behrend [J). For all L e N, there exists a subset E C 
{0, 1,...,L— 1} having more than L exp(— cy/log L) elements that does not con- 
tain any nontrivial arithmetic progression of length 3. 

Proof, (of Theorem I l(J.5fl Let X = T x T, with Haar measure /i = m x m and 
transformation T: X — > X given by T(x, y) = (x, y + x). 

Let E C {0, 1, . . . , L — 1}, not containing any nontrivial arithmetic progression 
of length 3. Define 

which we consider as a subset of the torus and A = T x B. 

For every integer n^0, we have T n (x, y) — (x, y + nx) and 

fi(A n T n A n T 2n A) — // l B (y)l B (y + nx)l B (y + 2nx)dm(y)dm(x) 




+ x)l B (y + 2x) dm(y) dm(x) . 



Bounding this integral, we have that: 

fi(A n T n A n T 2n A) = [[ l B (y)l B (y + x)l B (y + 2x) dm(x) dm(y) 

i JTxT 

< m{B) 
~ 4L 

By Behrend's Theorem, we can choose the set E with cardinality on the order 
of L exp (— Cy/\og L) . Choosing L sufficiently large, a simple computation gives the 
statement. □ 

For longer arithmetic progressions, the counterexample of Theorem 110.31 is 
based on a construction of Ruzsa. When P is a nonconstant integer polynomial of 
degree < 2, the subset 

{P(Q),P(1),P(2),P(3),P(4)} 

of Z is called a quadratic configuration of 5 terms, written QC5 for short. 

Any QC5 contains at least 3 distinct elements. An arithmetic progression of 
length 5 is a QC5, corresponding to a polynomial of degree 1. 
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Theorem 10.7 (Ruzsa [1]). For all LeN, there exists a subset E C {0, 1, . . . , 
L — 1} having more than L exp(— c\f\og L) elements that does not contain any QC5. 

Based on this, we show: 

Theorem 10.8 (Bergelson, Host and Kra [§]). There exists an ergodic system 
(X, X , fJ-,T) and, for every I £ N, there exists A £ X with fi(A) > such that 

fi(A n T n A n T 2n A n T 3n A n T in A) < -^(A) e 
for every integer )i^0. 

Once again, proof gives the estimate fj,(A)^ cios ^ A '' , for some constant c > 0. 

The construction again involves a simple example: T is the torus with Haar 
measure m, X = T x T, and /i — mx m. Let a £ T be irrational and let T: X — > X 
be 

T(x, y) = (x + a, y + 2x + a) . 
Combinatorially this example becomes: for all k £ N, there exists 5 > such 
that for infinitely many integers TV, there is a subset A C {1, . . . , N} with \A\ > SN 
that contains no more than ^S k N arithmetic progressions of length > 5 with the 
same difference. 

10.4. Combinatorial consequences. Via a slight modification of the Corre- 
spondence Principle, each of these results translates to a combinatorial statement. 
For e > and £cZ with positive upper Banach density, consider the set 

(10.1) {n £ Z: d{E n (E + n) n {E + 2n) n . . . n (E + kn)) > d(E k+1 ) - e] . 

From Theorems 110.21 and 110.31 for k = 2 and for k = 3, this set is syndetic, while 
for k > 4 there exists a set of integers E with positive upper Banach density such 
that the set in (|1C). 1|1 is empty. 

We can refine this a bit further. Recall the notation from Szcmcrcdi's Theorem: 
for every 5 > and k £ N, there exists N(5, k) such that for all N > N(S, k), every 
subset of {1, ... , N} with at least SN elements contains an arithmetic progression 
of length k. 

For an arithmetic progression {a, a + s, . . . , a + (k — l)s}, s is the difference of 
the progression. Write [^J for integer part of x. From Szemeredi's Theorem, we 
can deduce that every subset E of {1, ... , N} with at least SN elements contains at 
least [c./V 2 J arithmetic progressions of length k, where c = c(k, S) > is a constant. 
Therefore the set E contains at least [c(k, S)N\ progressions of length k with the 
same difference. 

The ergodic results of Theorem 110.21 give some improvement for k = 3 and 
k = 4 (see [1] for the precise statement). For k — 3, this was strengthened by 
Green: 

Theorem 10.9 (Green |24j). For all S, e > 0, there exists No(S,e) such that 
for all N > Nq(8,e) and any E C {1, . . . ,N} with \E\ > SN, E contains at least 
(1 — e)S 3 N arithmetic progressions of length 3 with the same difference. 

On the other hand, the similar bound for longer progressions with length k > 5 
does not hold. The proof in [5] , based on an example of Rusza, does not use ergodic 
theory. We show that for all k £ N, there exists S > such that for infinitely many 
N, there exists a subset E of {1, . . . , N} with \E\ > SN that contains no more than 
7}5 k N arithmetic progressions of length > 5 with the same step. 
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10.5. Polynomial averages. One can ask if similar lower bounds hold for 
the polynomial averages. For independent polynomials, using the fact that the 
characteristic factor is the Kronecker factor, we can show: 

Theorem 10.10 (Frantzikinakis and Kra [H]). Let k € N, (X, X, fi, T) be a 

system, A 6 X , and letpi,p2, ■ ■ ■ ,Pk '■ Z — > Z be rationally independent polynomials 
with Pi(0) — for i = 1, 2m . . . , k. Then for every e > 0, the set 

{neZ:n(An T P1 ^A n T P2 (") n . . . n T Pk ^A) > n(A) k+1 - e} 

is syndetic. 

Once again, this result fails on average. 

Via Correspondence, analogous to the results of l|10.1|L we have that for E C Z 
and rationally independent polynomials pi,P2, ■ ■ ■ ,Pk '■ Z ~~ * Z with Pi(0) = for 
i = 1, 2, . . . , k, then for all e > 0, the set 

{n e Z: d(E (1 {E + Pl (n)) n . . . n (E+ Pk {n))) > d{E) k+1 - e} 

is syndetic. 

Moreover, in |14| we strengthen this and show that there are many configura- 
tions with the same n giving the differences: if Pi,P2, ■ ■ ■ >Pk '■ Z — > Z are rationally 
independent polynomials with Pi(Q) = for i = 1, 2, . . . , k, then for all 8,e > 0, 
there exists N(6, e) such that for all N > N(5, e) and any subset E C {1, . . . , N} 
with \E\ > SN contains at least (1 — e)8 k+1 N configurations of the form 

{x, x+pi(n),x+ P2(n), ...,x + Pk(n)} 

for a fixed n 6 N. 
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